On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering
نویسندگان
چکیده
In this paper we present our research of online hot topic detection and label extraction method for our hot topic recommendation system. Using a new topical feature selection method, the feature space is compressed suitable for an online system. The tolerance rough set model is used to enriching the small set of topical feature words to a topical approximation space. According to the distance defined on the topical approximation space, the web pages are clustered into groups which will be merged with document overlap. The topic labels are extracted based on the approximation topical space enriched with the useful but high frequency topical words dropped by the clustering process. The experiments show that our method could generate more information abundant classes and more topical class labels, alleviate the topical drift caused by the non-topical and noise words.
منابع مشابه
Clustering of Web Usage Data Using Fuzzy Tolerance Rough Set Similarity and Table Filling Algorithm
Web Usage Mining is the application of data mining techniques to learn usage patterns from Web server log file in order to understand and better serve the requirements of web based applications. Web Usage Mining includes three most important steps namely Data Preprocessing, Pattern discovery and Analysis of the discovered patterns. One of the most important tasks in Web usage mining is to find ...
متن کاملImproving Quality of Search Results Clustering with Approximate Matrix Factorisations
In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We compare four different factorisations (SVD, NMF, LNMF and K-Means/Concept Decomposition) with respect to topic separation capability, outlier detection and label quality. We also compare our approach with two other clustering ...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملTopological structure on generalized approximation space related to n-arry relation
Classical structure of rough set theory was first formulated by Z. Pawlak in [6]. The foundation of its object classification is an equivalence binary relation and equivalence classes. The upper and lower approximation operations are two core notions in rough set theory. They can also be seenas a closure operator and an interior operator of the topology induced by an equivalence relation on a u...
متن کاملA Comparative Study of Topic Models for Topic Clustering of Chinese Web
Topic model is an increasing useful tool to analyze the semantic level meanings and capture the topical features. However, there is few research about the comparative study of the topic models. In this paper, we describe our comparative study of three topic models in the extrinsic application of topic clustering. The topic model distance is defined on the converged parameters of topic models, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JCP
دوره 5 شماره
صفحات -
تاریخ انتشار 2010